Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ignore: handling of broken symlinks #1503

Conversation

sharkdp
Copy link
Contributor

@sharkdp sharkdp commented Feb 28, 2020

This is maybe more of a question than a bug report.

Currently, the ignore crate has the following behavior:

  • If .follow_links(false): both broken and normal symlinks are reported as directory entries. This is as expected.
  • If .follow_links(true): normal symlinks are reported as directory entries (and traversed, if they point to a directory), broken symlinks are not reported (an ignore::Error::WithPath is returned from the iterator).

Question: in the follow_links(true) case, do we expect to see broken symlinks as normal directory entries?

I would argue yes, they should be reported. Broken symlinks are file-like directory entries which should show up when "walking a directory" with ignore. This is backed by finds behavior, which does report broken symlinks, even when -follow is present.

This is a PR which adds a (currently failing) unit test, which can be used as a basis to fix this, if desired. I tried to implement a fix myself, but didn't get very far:

diff --git a/crates/ignore/src/walk.rs b/crates/ignore/src/walk.rs
index 6fd7087..c7bf039 100644
--- a/crates/ignore/src/walk.rs
+++ b/crates/ignore/src/walk.rs
@@ -958,6 +958,13 @@ impl Iterator for Walk {
             };
             match ev {
                 Err(err) => {
+                    if err.io_error().map_or(false, |io_error| {
+                        io_error.kind() == io::ErrorKind::NotFound
+                    }) {
+                        // TODO: make sure this is actually caused by a broken symlink
+                        // TODO: how to create a `DirEntry` from `io_error`? We only have
+                        // io_error.path().
+                    }
                     return Some(Err(Error::from_walkdir(err)));
                 }
                 Ok(WalkEvent::Exit) => {

@BurntSushi
Copy link
Owner

Interesting question. It kind of feels to me like the current API is probably right though? The user is asking for symlinks to be followed, so ignore will do just that. If an error occurs while attempting that, then that really should be reported as an error. This is also consistent with how the rest of the API is designed and is probably why you're having trouble creating a DirEntry. Namely, methods like file_type() and metadata() return the relevant value for the target of the link when follow_links is enabled. (Technically this isn't documented behavior in ignore, but it should be, and is documented behavior in walkdir.)

So if the caller asked to follow symlinks, and you yield a dir entry with a symlink that could not be followed, then how are methods like metadata and file_type implemented? The only things you can really do here are return an error or not follow the symlink. But this could lead to surprising behavior. If you asked the traversal to follow symlinks, then I think it's reasonable to expect that every dir entry is not a symlink. If the symlink couldn't be followed, then that really seems like a proper error to me.

Moreover, if we decided to return a DirEntry instead, and a symlink couldn't be followed because an error occurred, then how is that error reported to the caller? It sounds like you'd either have to squash it or otherwise attach the error to the DirEntry. Which seems not great.

Your point about find is well taken, but it seems to me like you have all the tools with the current API to implement that behavior, no? That is, if your application wants to emit broken symlinks even when the user asked to follow them, then it should be possible to inspect the error. Maybe the problem here is that you don't have a way to detect whether the error is due to being unable to follow the symlink, then I'd be in favor of adding a method or something to detect that case specifically on the error type. Would that work?

@sharkdp
Copy link
Contributor Author

sharkdp commented Feb 28, 2020

Thank you so much for your detailed response!

This is also consistent with how the rest of the API is designed [...]. Namely, methods like file_type() and metadata() return the relevant value for the target of the link when follow_links is enabled.

Right. I hadn't thought about that.

So if the caller asked to follow symlinks, and you yield a dir entry with a symlink that could not be followed, then how are methods like metadata and file_type implemented?

According to find, a broken symlink still has a "file type" of "symlink":

❯ tree
.
├── broken_link -> non-existing
├── dir
└── normal_link -> dir

❯ find -type l        
./normal_link
./broken_link

❯ find -follow -type l
./broken_link

So in this sense, I would expect a DirEntry with a std::fs::FileType that returns true for .is_symlink().

Since .metadata() returns a Result<MetaData>, I would expect an Err(…) in this case. This would be consistent with what ls does (prints a warning, but shows the entry with unknown metadata):

❯ ls -l   
total 0
lrwxrwxrwx 1 shark lxd 12 Feb 28 17:14 broken_link -> non-existing
drwxr-xr-x 2 shark lxd 40 Feb 28 17:14 dir
lrwxrwxrwx 1 shark lxd  3 Feb 28 17:13 normal_link -> dir

/tmp/fooo                                                                        
❯ ls -l -L
ls: cannot access 'broken_link': No such file or directory
total 0
l????????? ? ?     ?    ?            ? broken_link
drwxr-xr-x 2 shark lxd 40 Feb 28 17:14 dir
drwxr-xr-x 2 shark lxd 40 Feb 28 17:14 normal_link

If you asked the traversal to follow symlinks, then I think it's reasonable to expect that every dir entry is not a symlink. If the symlink couldn't be followed, then that really seems like a proper error to me.

Moreover, if we decided to return a DirEntry instead, and a symlink couldn't be followed because an error occurred, then how is that error reported to the caller? It sounds like you'd either have to squash it or otherwise attach the error to the DirEntry. Which seems not great.

Agreed, I see your point.

Your point about find is well taken, but it seems to me like you have all the tools with the current API to implement that behavior, no? That is, if your application wants to emit broken symlinks even when the user asked to follow them, then it should be possible to inspect the error.

We had two failed attempts at fixing this bug (sharkdp/fd#357) in fd (sharkdp/fd#366, sharkdp/fd#497), but that certainly doesn't mean it's not possible. I will try once again myself.

Maybe the problem here is that you don't have a way to detect whether the error is due to being unable to follow the symlink, then I'd be in favor of adding a method or something to detect that case specifically on the error type. Would that work?

We currently get a ignore::Error::WithPath with an internal io::Error of kind io::ErrorKind::NotFound ("An entity was not found, often a file."). Subsequently, we can check if the path inside the error is a symlink (with std::fs::symlink_metadata, i.e. without following the link). This way, I think we should actually be able to implement this.

@sharkdp
Copy link
Contributor Author

sharkdp commented Feb 28, 2020

I found a way to make this work: sharkdp/fd@cc08493...9fe73ba

I think we can close this(?). Maybe the only thing to note is that users of ignore which want to implement a ls or find-like tool will maybe not have the optimal API to handle broken symlinks. But I don't think this is enough of a reason to make a breaking change to ignores current behavior. This is a rather weird use case anyway (combination of follow and broken symlinks).

@lzm0
Copy link

lzm0 commented Mar 7, 2022

As you probably know, VS Code depends on ripgrep to find files by name: https://github.com/microsoft/vscode/blob/ab49f4f310a014c9e03839f37c1ca3c48196aeca/src/vs/workbench/services/search/node/fileSearch.ts#L200
Therefore, VS Code cannot find broken symbolic links in a workspace due to this issue.
I just filed a bug to them: microsoft/vscode#144610
I looked into this issue on my own and traced it down to here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants